Introduction

This is an exploration into the following question:

How do public high schools police bodies differently?

As such, I have been surveying public high schools with available dress codes for the 2018-2019 school year (and no uniform policy) across the US. I found a list of potential schools using the National Center for Education Statistic’s search function for public schools. I limited the resulting schools to just those that had a web address listed. I then scraped the homepage content for each website and searched for words like handbook, dress code and code of conduct. I further filtered my list of schools to just those that contained one of the above phrases. Then I manually visited each of the resulting (2000+) websites to find the actual dress code and to verify that there was no uniform policy and that the dress code was from the 2018 - 2019 school year. I ended up with a list of 831 schools considered to be “regular”, “non-magnet”, “non-boarding” high schools that met my qualifications. I’ve been manually collecting information from them in this Google Form.

Now to process that data.

Loading packages

To start, I’ll only need a few packages, mostly to connect me with the Google Sheets where the data from my form is being collected.

library(tidyverse)
library(here)
library(maps)

Downloading Google Drive Data

I’ll download the data from Google Drive and set the overwrite permissions to TRUE so that I can re-run this analysis easily as I update more data.

googledrive::drive_download("Dress Code Responses", path = here::here("raw_data", "collected", "responses.csv"), type = "csv", overwrite = TRUE)

responses <- read.csv(here::here("raw_data", "collected", "responses.csv"), stringsAsFactors = FALSE, header = TRUE, na.strings = c("", " "))

Organizing the Data

Because of the way that Google Sheets works, we end up with very wide data. That is, each clothing item is listed as a column instead of having several rows per school. Therefore, my data needs to go from wide to long.

In order to do that, I need to first split out information about specific item length and strap width since these data were entered in a slightly different way than the rest.

Length

length <- responses %>% 
  select(c("School.Name", "School.State.Abbreviation"), contains("length.limit")) %>% 
  rename(limits = !!names(.[3]), length = !!names(.[4])) %>% 
  filter(limits != "") %>% 
  # Separate comma delimited list of items
  separate_rows(limits, sep = ",") %>% 
  mutate(limits = trimws(limits)) 

How many schools regulate the length of clothing items?

schools <- length %>% 
  distinct(School.Name) 

# Return count
nrow(schools)
## [1] 252
# Return percentage
nrow(schools) / nrow(responses)
## [1] 0.7455621

Alright, so about 74% of the schools that I’ve surveyed so far have some length limitations.

What are those length limitations?

lengthLimits <- length %>% 
  separate_rows(length, sep = ",") %>% 
  filter(!grepl("\\(?[0-9,.]+\\)?", length)) %>% 
  count(length, sort = TRUE) %>% 
  filter(n >= 10)

lengthLimits
## # A tibble: 8 x 2
##   length                                                                 n
##   <chr>                                                              <int>
## 1 shorter than fingertips                                              180
## 2 "\"short\"/\"inappropriate length\" (no exact measurement given)"    121
## 3 shorter than mid-thigh                                               116
## 4 shorter than x inches from knee (insert amount in other)             108
## 5 shorter than the knee                                                 36
## 6 " shorter than x inch inseam (insert amount in other)"                22
## 7 " shorter than x inches from knee (insert amount in other)"           11
## 8 " \"short\"/\"inappropriate length\" (no exact measurement given)"    10

What about by item type?

lengthByItem <- length %>% 
  count(limits, sort = TRUE) %>% 
  mutate(type = "length",
         limits = paste0("short ", limits)) %>% 
  rename(item = limits)

lengthByItem
## # A tibble: 10 x 3
##    item               n type  
##    <chr>          <int> <chr> 
##  1 short shorts     241 length
##  2 short skirts     208 length
##  3 short dresses    122 length
##  4 short skorts      12 length
##  5 short pants        7 length
##  6 short jumpers      4 length
##  7 short capris       1 length
##  8 short clothing     1 length
##  9 short shirts       1 length
## 10 short tunics       1 length

Strap Width

straps <- responses %>% 
  select(c("School.Name", "School.State.Abbreviation"), contains("shirt.straps")) %>% 
  rename(limits = !!names(.[3])) %>% 
  filter(limits != "") %>% 
  # Separate comma delimited list of items
  separate(limits, into = c("limits", "inches"), sep = ",") %>% 
  mutate(limits = trimws(limits)) 
## Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [27].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 19 rows [1,
## 11, 12, 18, 24, 31, 36, 38, 39, 45, 54, 55, 62, 72, 76, 77, 78, 79, 82].

How many about strap width?

schoolsStrap <- straps %>% 
  distinct(School.Name) 

# Return count
nrow(schoolsStrap)
## [1] 90
# Return percentage
nrow(schoolsStrap) / nrow(responses)
## [1] 0.2662722

Only 28% have some limitations on strap width explicitly laid out in the handbook.

Banned Items

In order to figure out all of the banned items, we need to eliminate some columns of unneccessary information.

banned <- responses %>% 
  select(-contains("sanctions"), - contains("shirt.straps"), -contains("length.limit"))
elongateCode <- function(type){

  if(type == "clothing" || type == "body"){
    regex <- paste0("other.", type)
  } else {
    regex <- paste0("\\b", type, "\\b.*?\\gender\\b")
  }
  bonusColumn <- colnames(select(banned, matches(regex)))
  
  new <- banned %>%
    # First sort out any additional items that we added to our bonus column
    separate_rows_(bonusColumn, sep = ",") %>%
    separate(bonusColumn, into = c("item", "prohibited"), sep = ":") %>%
    mutate(type = type) %>% 
    select(c(1:6), c(item, prohibited, type)) %>% 
    filter(!is.na(prohibited)) %>% 
    mutate(prohibited = ifelse(prohibited == "NA", "none", prohibited))
  
  new2 <- banned %>%
    select(-contains("any.other")) %>% 
    gather(key = item, value = prohibited, colnames(select(., contains(!!type)))) %>%
    filter(!is.na(prohibited)) %>%
    mutate(item = gsub("(.*\\.{3})", "", item),
         item = gsub("\\.", " ", item),
         item = trimws(item)) %>%
    mutate(type = type) %>%
    select(c(1:6), c(item:type))
  
  combined <- rbind(new, new2)
}



clothingTypes <- c("accessories", "shirt", "skirt.dress", "pants", "shorts", "undergarment", "footwear", "headwear", "grooming", "body", "clothing")

longDressCode <- map_dfr(clothingTypes, elongateCode)

#write.csv(longDressCode, here("processed_data", "clean_dress_code.csv"), row.names = FALSE)

Exploring Data

Explicit Bans

Some schools do explicitly ban things for either male or female students.

How many schools explicitly ban things for male vs. female students?

explicitSchools <- longDressCode %>% 
  filter(prohibited != "none") %>% 
  separate_rows(prohibited, sep = ",") %>% 
  mutate(prohibited = trimws(prohibited),
         prohibited = case_when(
           prohibited == "boys" ~ "male",
           prohibited == "Na" ~ NA_character_,
           prohibited == "NA" ~ NA_character_,
           TRUE ~ prohibited
         )) %>% 
  filter(prohibited == "male" | prohibited == "female") %>% 
  group_by(School.Name, prohibited) %>% 
  count()

nrow(explicitSchools) / nrow(responses)
## [1] 0.2633136

Alright so 26% of schools have banned at least one item explicitly for either male or female students.

explicitCount <- longDressCode %>% 
  filter(prohibited != "none") %>% 
  separate_rows(prohibited, sep = ",") %>% 
  mutate(prohibited = trimws(prohibited),
         prohibited = case_when(
           prohibited == "boys" ~ "male",
           prohibited == "Na" ~ NA_character_,
           prohibited == "NA" ~ NA_character_,
           TRUE ~ prohibited
         )) %>% 
  filter(prohibited == "male" | prohibited == "female") %>% 
  group_by(School.Name, prohibited) %>% 
  count() %>% 
  spread(prohibited, n) %>%
  replace(is.na(.), 0) %>% 
  mutate(diff = (abs(male-female)) / ((male + female) / 2))

mean(explicitCount$diff)
## [1] 0.9213809

How many things are explicitly banned for male vs. female students?

explicitCount <- longDressCode %>% 
  filter(prohibited != "none") %>% 
  separate_rows(prohibited, sep = ",") %>% 
  mutate(prohibited = trimws(prohibited),
         prohibited = case_when(
           prohibited == "boys" ~ "male",
           prohibited == "Na" ~ NA_character_,
           prohibited == "NA" ~ NA_character_,
           TRUE ~ prohibited
         )) %>% 
  filter(prohibited == "male" | prohibited == "female") %>% 
  count(prohibited)

explicitCount
## # A tibble: 2 x 2
##   prohibited     n
##   <chr>      <int>
## 1 female       132
## 2 male         210

So it’s more common to explicitly ban things for male students.

Let’s take a look at what those things were.

explicit <- longDressCode %>% 
  filter(prohibited != "none") %>% 
  separate_rows(prohibited, sep = ",") %>% 
  mutate(prohibited = trimws(prohibited),
         prohibited = case_when(
           prohibited == "boys" ~ "male",
           prohibited == "Na" ~ NA_character_,
           prohibited == "NA" ~ NA_character_,
           TRUE ~ prohibited
         )) %>% 
  filter(prohibited == "male" | prohibited == "female") %>% 
  group_by(prohibited) %>% 
  count(item, sort = TRUE) %>% 
  group_by(prohibited) %>% 
  head(n = 10)

explicit
## # A tibble: 10 x 3
## # Groups:   prohibited [2]
##    prohibited item                                n
##    <chr>      <chr>                           <int>
##  1 male       earrings                           18
##  2 male       sleeveless                         16
##  3 female     midsection midriff                 11
##  4 female     piercings other than ear lobes     10
##  5 female     spaghetti string straps            10
##  6 male       piercings other than ear lobes      8
##  7 female     low necklines                       7
##  8 male       midsection midriff                  7
##  9 male       muscle shirts open sides            7
## 10 male       hairstyles that obstruct vision     6
ggplot(explicit, aes(x = item, y = n, fill = prohibited)) + geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

maleHair <- longDressCode %>% 
  filter(prohibited != "none") %>% 
  separate_rows(prohibited, sep = ",") %>% 
  mutate(prohibited = trimws(prohibited),
         prohibited = case_when(
           prohibited == "boys" ~ "male",
           prohibited == "Na" ~ NA_character_,
           prohibited == "NA" ~ NA_character_,
           TRUE ~ prohibited
         )) %>% 
  filter(prohibited == "male" | prohibited == "female") %>% 
  mutate(hair = grepl("collar|lobe", item))

Alright, so there aren’t too many items that are explicitly gendered. Let’s look at what types of items are banned overall and then I’ll get into things that are implicitly targetting students of a particular gender.

Body Parts

Which body parts are specifically prohibited the most on high school campuses?

Clothing Types

What types of clothing have been specifically prohibited?

Shirts

Shorts

Pants

Skirts

Accessories

Footwear

Undergarments

Headwear

Grooming

Clothing Attributes

Sometimes descriptive words are used instead of explicit clothing items (showing items prohibited by a minimum of 3 schools).

Rationale

Typically before listing the actual items that are prohibited, schools give a brief rationale for why their dress codes exist. What words show up the most there?

Targetted Prohibitions

I need to create a list of items that have been prohibited so I can get a better sense of how many are targetted towards specific types of students.

bannedItems <- longDressCode %>% 
  group_by(type, item) %>% 
  count(item, sort = TRUE)

bannedwithLength <- bannedItems %>% 
  ungroup() %>% 
  bind_rows(lengthByItem) %>% 
  arrange(desc(n))

write.csv(bannedItems, here::here("processed_data", "bannedItems.csv"), row.names = FALSE)

I’ll now upload this to Google Drive to make some manual annotations.

Let’s take a look again at the banned items highlighting the gender (if any) that these items implicitly target.

Body Parts

Which body parts are specifically prohibited the most on high school campuses?

Clothing Types

What types of clothing have been specifically prohibited?

Shirts

Shorts

Pants

Skirts & Dresses

Accessories

Footwear

Undergarments

Headwear

Grooming

Clothing Attributes

Sometimes descriptive words are used instead of explicit clothing items (showing items prohibited by a minimum of 3 schools).

If we were to look at the breakdown between the banned items that are implicitly targetted at female students vs. male students: (100% = 100% female targeted, 0% = 100% male targeted). This calculation does not include the banned items that are not implicitly targeted at either male or female students.

If we do look at the percentage of female gender-targetted items compared to all items, here is the distribution:

## Warning: Removed 1 rows containing non-finite values (stat_bin).

And male gender-targeted:

If we do look at the percentage of female gender-targetted items compared to all items, here is the distribution:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).

Similarly, this is the percentage of items in the dresscode implicitly target students of color (with 100% = 100% poc targetted, and 0% = 0% poc targetted).

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

histogramData <- bannedGenderAll %>% 
  select(c(School.Name, pFem, pMal)) %>% 
  left_join(bannedRace) %>% 
  select(-c(n, y)) %>% 
  mutate(wCat = 0) %>% 
  mutate(femCat = case_when(
    pFem <= 5 ~ 0,
    between(pFem, 5, 10) ~ 1,
    between(pFem, 10, 15) ~ 2,
    between(pFem, 15, 20) ~ 3,
    between(pFem, 20, 25) ~ 4,
    between(pFem, 25, 30) ~ 5,
    between(pFem, 30, 35) ~ 6,
    between(pFem, 35, 40) ~ 7,
    between(pFem, 40, 45) ~ 8,
    between(pFem, 45, 50) ~ 9,
    between(pFem, 50, 55) ~ 10,
    between(pFem, 55, 60) ~ 11,
    between(pFem, 60, 65) ~ 12,
    between(pFem, 65, 70) ~ 13,
    between(pFem, 70, 75) ~ 14,
    between(pFem, 75, 80) ~ 15,
    between(pFem, 80, 85) ~ 16,
    between(pFem, 85, 90) ~ 17,
    between(pFem, 90, 95) ~ 18,
    between(pFem, 95, 100) ~ 19,
    TRUE ~ pFem
  ), 
  malCat = case_when(
     pMal <= 5 ~ 0,
    between(pMal, 5, 10) ~ 1,
    between(pMal, 10, 15) ~ 2,
    between(pMal, 15, 20) ~ 3,
    between(pMal, 20, 25) ~ 4,
    between(pMal, 25, 30) ~ 5,
    between(pMal, 30, 35) ~ 6,
    between(pMal, 35, 40) ~ 7,
    between(pMal, 40, 45) ~ 8,
    between(pMal, 45, 50) ~ 9,
    between(pMal, 50, 55) ~ 10,
    between(pMal, 55, 60) ~ 11,
    between(pMal, 60, 65) ~ 12,
    between(pMal, 65, 70) ~ 13,
    between(pMal, 70, 75) ~ 14,
    between(pMal, 75, 80) ~ 15,
    between(pMal, 80, 85) ~ 16,
    between(pMal, 85, 90) ~ 17,
    between(pMal, 90, 95) ~ 18,
    between(pMal, 95, 100) ~ 19,
    TRUE ~ pMal
  ),
  raceCat = case_when(
    pRac <= 5 ~ 0,
    between(pRac, 5, 10) ~ 1,
    between(pRac, 10, 15) ~ 2,
    between(pRac, 15, 20) ~ 3,
    between(pRac, 20, 25) ~ 4,
    between(pRac, 25, 30) ~ 5,
    between(pRac, 30, 35) ~ 6,
    between(pRac, 35, 40) ~ 7,
    between(pRac, 40, 45) ~ 8,
    between(pRac, 45, 50) ~ 9,
    between(pRac, 50, 55) ~ 10,
    between(pRac, 55, 60) ~ 11,
    between(pRac, 60, 65) ~ 12,
    between(pRac, 65, 70) ~ 13,
    between(pRac, 70, 75) ~ 14,
    between(pRac, 75, 80) ~ 15,
    between(pRac, 80, 85) ~ 16,
    between(pRac, 85, 90) ~ 17,
    between(pRac, 90, 95) ~ 18,
    between(pRac, 95, 100) ~ 19,
    TRUE ~ pRac
  )) %>% 
  select(-c(pFem, pRac, pMal)) %>% 
  gather(type, group, -School.Name) %>% 
  rename(school = School.Name) %>% 
  mutate(type = case_when(
    type == "femCat" ~ "f",
    type == "malCat" ~ "m",
    type == "raceCat" ~ "c",
    type == "wCat" ~ "w",
    TRUE ~ "other"
  ))
## Joining, by = "School.Name"
# Save into JS analysis folder
write.csv(histogramData, "../src/assets/data/histogramData.csv", row.names = FALSE, na = "")
intersections <- banned %>% 
  replace_na(list(race = "n", gender = "n")) %>% 
  group_by(gender, race) %>% 
  count()

intersections
## # A tibble: 6 x 3
## # Groups:   gender, race [6]
##   gender race     nn
##   <chr>  <chr> <int>
## 1 f      n      2128
## 2 f      y        48
## 3 m      n       504
## 4 m      y       980
## 5 n      n      2829
## 6 n      y        63

Demographic Data

Of the 300 schools I’ve collected data for so far, what is their state distribution?

byState <- responses %>% count(School.State.Abbreviation, sort = TRUE) %>% 
  mutate(state = state.name[match(School.State.Abbreviation, state.abb)]) %>% 
  mutate(state = tolower(state))

ggplot(byState, aes(x = reorder(School.State.Abbreviation, n), y = n)) + geom_bar(stat = "identity")

Let’s try to look at this on a map:

us <- map_data("state")

ggplot() + geom_map(data = us, map = us, aes(x = long, y = lat, map_id = region),
                    fill = "#ffffff", color="#ffffff", size = 0.15) +
  geom_map(data = byState, map = us, aes(fill = n, map_id = state)) + 
  scale_fill_continuous(low='thistle2', high='darkred', 
                                 guide='colorbar') 
## Warning: Ignoring unknown aesthetics: x, y

Alright, so Texas and NY are pretty high here and we’re missing some decent chunks of the country but overall, not terrible.

Let’s combine our demographic data with our dress code data

shuffledHandbooks <- read.csv(here::here("processed_data", "shuffledHandbooks.csv"), stringsAsFactors = FALSE, header = TRUE) %>% 
  mutate(schoolName = trimws(schoolName),
         stateAbb = trimws(stateAbb)) 

demo <- banned %>% 
  mutate(schoolName = trimws(School.Name),
         stateAbb = trimws(School.State.Abbreviation)) %>% 
  left_join(shuffledHandbooks)
## Joining, by = c("schoolName", "stateAbb")

Are schools with a lower percentage of white students more likely to implicitly target students of color?

raceDemo <- demo %>% 
  left_join(bannedRace, by = "School.Name")

ggplot(raceDemo, aes(x = pWhite, y = pRac)) + geom_point() + xlab("percentage of white students") + ylab("percentage of implicitly racial prohibited items")
## Warning: Removed 7 rows containing missing values (geom_point).

This appears to be either flat, or slightly the opposite of what I expected. Let’s look for a quick correlation.

cor(raceDemo$pRac, raceDemo$pWhite)
## [1] NA

That is pretty low, looks like there isn’t much of a correlation between the population of white students and items specifically targetted at students of color. However, that may change if we look specifically at the population of black students (since many of these banned items are specifically targeted at black students).

ggplot(raceDemo, aes(x = pBlack, y = pRac)) + geom_point()+ xlab("percentage of black students") + ylab("percentage of implicitly racial prohibited items")
## Warning: Removed 7 rows containing missing values (geom_point).

Again, really flat.

cor(raceDemo$pBlack, raceDemo$pRac)
## [1] NA

Slightly higher correlation than with white students, but still nothing explanatory. Maybe take a quick look at the state distribution.

ggplot(raceDemo, aes(reorder(School.State.Abbreviation, pRac), pRac)) + geom_point() + xlab("state") + ylab("percentage of implicitly racial prohibited items")

There’s a pretty wide spread within and between schools in a single state. Maybe there’s a difference in gender-targetted policies within/between states.

genDemo <- demo %>% 
  left_join(bannedGender, by = "School.Name")

ggplot(genDemo, aes(reorder(School.State.Abbreviation, pFem), pFem)) + geom_point() + xlab("state") + ylab("percentage of implicitly gender prohibited items")
## Warning: Removed 19 rows containing missing values (geom_point).

Not so much. Perhaps gender differences by the school locale?

ggplot(genDemo, aes(reorder(locale, pFem), pFem)) + geom_point() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("school locale") + ylab("percentage of implicitly gender prohibited items")
## Warning: Removed 19 rows containing missing values (geom_point).

Not so much. Perhaps race differences by the school locale?

ggplot(raceDemo, aes(reorder(locale, pRac), pRac)) + geom_point() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("school locale") + ylab("percentage of implicitly racial prohibited items")

Seemingly a few more outliers here, but no major trends.

Things Left that I could look at

  • Sanctions